The multiplicative weights update method is an algorithmic technique most commonly used for decision making and prediction, and also widely deployed in game theory and algorithm design. The simplest use case is the problem of prediction from expert advice, in which a decision maker needs to iteratively decide on an expert whose advice to follow. The method assigns initial weights to the experts (usually identical initial weights), and updates these weights multiplicatively and iteratively according to the feedback of how well an expert performed: reducing it in case of poor performance, and increasing it otherwise. It was discovered repeatedly in very diverse fields such as machine learning (AdaBoost, Winnow, Hedge), optimization (solving linear programs), theoretical computer science (devising fast algorithm for LPs and SDPs), and game theory.
The multiplicative weights algorithm is also widely applied in computational geometry such as Kenneth Clarkson's algorithm for linear programming (LP) with a bounded number of variables in linear time.Kenneth L. Clarkson. A Las Vegas algorithm for linear programming when the dimension is small., In Proc. 29th FOCS, pp. 452–456. IEEE Comp. Soc. Press, 1988.doi:10.1109/SFCS.1988.21961 123, 152.Kenneth L. Clarkson. A Las Vegas algorithm for linear and integer programming when the dimension is small., Journal of the ACM, 42:488–499, 1995. doi:10.1145/201019.201036 123, 152. Later, Bronnimann and Goodrich employed analogous methods to find set covers for with small VC dimension. Preliminary version in 10th Ann. Symp. Comp. Geom. (SCG'94).
In operations research and on-line statistical decision making problem field, the weighted majority algorithm and its more complicated versions have been found independently.
In computer science field, some researchers have previously observed the close relationships between multiplicative update algorithms used in different contexts. Young discovered the similarities between fast LP algorithms and Raghavan's method of pessimistic estimators for derandomization of randomized rounding algorithms; Klivans and Servedio linked boosting algorithms in learning theory to proofs of Yao's XOR Lemma; Garg and Khandekar defined a common framework for convex optimization problems that contains Garg-Konemann and Plotkin-Shmoys-Tardos as subcases.
The Hedge algorithm is a special case of mirror descent.
Unlike halving algorithm which dismisses experts who have made mistakes, weighted majority algorithm discounts their advice. Given the same "expert advice" setup, suppose we have n decisions, and we need to select one decision for each loop. In each loop, every decision incurs a cost. All costs will be revealed after making the choice. The cost is 0 if the expert is correct, and 1 otherwise. this algorithm's goal is to limit its cumulative losses to roughly the same as the best of experts.
The very first algorithm that makes choice based on majority vote every iteration does not work since the majority of the experts can be wrong consistently every time. The weighted majority algorithm corrects above trivial algorithm by keeping a weight of experts instead of fixing the cost at either 1 or 0. This would make fewer mistakes compared to halving algorithm.
If , the weight of the expert's advice will remain the same. When increases, the weight of the expert's advice will decrease. Note that some researchers fix in weighted majority algorithm.
After steps, let be the number of mistakes of expert i and be the number of mistakes our algorithm has made. Then we have the following bound for every :
In particular, this holds for i which is the best expert. Since the best expert will have the least , it will give the best bound on the number of mistakes made by the algorithm as a whole.
Given the same setup with N experts. Consider the special situation where the proportions of experts predicting positive and negative, counting the weights, are both close to 50%. Then, there might be a tie. Following the weight update rule in weighted majority algorithm, the predictions made by the algorithm would be randomized. The algorithm calculates the probabilities of experts predicting positive or negatives, and then makes a random decision based on the computed fraction:
predict
where
The number of mistakes made by the randomized weighted majority algorithm is bounded as:
where and .
Note that only the learning algorithm is randomized. The underlying assumption is that the examples and experts' predictions are not random. The only randomness is the randomness where the learner makes his own prediction.
In this randomized algorithm, if . Compared to weighted algorithm, this randomness halved the number of mistakes the algorithm is going to make. However, it is important to note that in some research, people define in weighted majority algorithm and allow in randomized weighted majority algorithm.
Suppose we were given the distribution on experts. Let = payoff matrix of a finite two-player zero-sum game, with rows.
When the row player uses plan and the column player uses plan , the payoff of player is ≔, assuming .
If player chooses action from a distribution over the rows, then the expected result for player selecting action is .
To maximize , player should choose plan . Similarly, the expected payoff for player is . Choosing plan would minimize this payoff. By John Von Neumann's Min-max theorem, we obtain:
Then, let denote the common value of above quantities, also named as the "value of the game". Let be an error parameter. To solve the zero-sum game bounded by additive error of ,
So there is an algorithm solving zero-sum game up to an additive factor of δ using O(/) calls to ORACLE, with an additional processing time of O(n) per call
Bailey and Piliouras showed that although the time average behavior of multiplicative weights update converges to Nash equilibria in zero-sum games the day-to-day (last iterate) behavior diverges away from it.Bailey, James P., and Georgios Piliouras. "Multiplicative weights update in zero-sum games." Proceedings of the 2018 ACM Conference on Economics and Computation. ACM, 2018.
Given labeled examples where are feature vectors, and are their labels.
The aim is to find non-negative weights such that for all examples, the sign of the weighted combination of the features matches its labels. That is, require that for all . Without loss of generality, assume the total weight is 1 so that they form a distribution. Thus, for notational convenience, redefine to be , the problem reduces to finding a solution to the following LP:
This is general form of LP.
The hedge algorithm is similar to the weighted majority algorithm. However, their exponential update rules are different.
It is generally used to solve the problem of binary allocation in which we need to allocate different portion of resources into N different options. The loss with every option is available at the end of every iteration. The goal is to reduce the total loss suffered for a particular allocation. The allocation for the following iteration is then revised, based on the total loss suffered in the current iteration using multiplicative update.
Initialization: Fix an . For each expert, associate the weight ≔1
For t=1,2,...,T:
If there exists a x satisfying (1), then x satisfies (2) for all . The contrapositive of this statement is also true.
Suppose if oracle returns a feasible solution for a , the solution it returns has bounded width .
So if there is a solution to (1), then there is an algorithm that its output x satisfies the system (2) up to an additive error of . The algorithm makes at most calls to a width-bounded oracle for the problem (2). The contrapositive stands true as well. The multiplicative updates is applied in the algorithm in this case.
General setup
Algorithm analysis
Halving algorithm
Weighted majority algorithm
'''Initialization''':
Fix an . For each expert, associate the weight ≔1.
'''For''' = , ,...,
'''1'''. Make the prediction given by the weighted majority of the experts' predictions based on their weights. That is, choose 0 or 1 depending on which prediction has a higher total weight of experts advising it (breaking ties arbitrarily).
'''2'''. For every expert i that predicted wrongly, decrease his weight for the next round by multiplying it by a factor of (1-η):
= (update rule)
.
Randomized weighted majority algorithm
f(x) = \begin{cases}1 & \text{with probability} \frac{q_1}{W}\\0 & \text{otherwise}\end{cases}
.
Applications
Solving zero-sum games approximately (Oracle algorithm)
where P and i changes over the distributions over rows, Q and j changes over the columns.
Machine learning
Winnow algorithm
,
,
.
Hedge algorithm
Analysis
1. Pick the distribution where .
2. Observe the cost of the decision .
3. Set
).
AdaBoost algorithm
'''Input''':
Sequence of labeled examples (,),...,(, )
Distribution over the examples
Weak learning algorithm "'WeakLearn"'
Integer specifying number of iterations
'''Initialize''' the weight vector: for .
'''Do for'''
'''1'''. Set .
'''2'''. Call '''WeakLearn''', providing it with the distribution ; get back a hypothesis [0,1].
'''3'''. Calculate the error of .
'''4'''. Set .
'''5'''. Set the new weight vector to be .
'''Output''' the hypothesis:
Solving linear programs approximately
Problem
(1)
Assumption
Solution
(2)
Other applications
External links
|
|